Western Province
Hybrid Predictive Modeling of Malaria Incidence in the Amhara Region, Ethiopia: Integrating Multi-Output Regression and Time-Series Forecasting
Azezew, Kassahun, Tesema, Amsalu, Mekuria, Bitew, Kassie, Ayenew, Embiale, Animut, Salau, Ayodeji Olalekan, Asresa, Tsega
Malaria remains a major public health concern in Ethiopia, particularly in the Amhara Region, where seasonal and unpredictable transmission patterns make prevention and control challenging. Accurately forecasting malaria outbreaks is essential for effective resource allocation and timely interventions. This study proposes a hybrid predictive modeling framework that combines time-series forecasting, multi-output regression, and conventional regression-based prediction to forecast the incidence of malaria. Environmental variables, past malaria case data, and demographic information from Amhara Region health centers were used to train and validate the models. The multi-output regression approach enables the simultaneous prediction of multiple outcomes, including Plasmodium species-specific cases, temporal trends, and spatial variations, whereas the hybrid framework captures both seasonal patterns and correlations among predictors. The proposed model exhibits higher prediction accuracy than single-method approaches, exposing hidden patterns and providing valuable information to public health authorities. This study provides a valid and repeatable malaria incidence prediction framework that can support evidence-based decision-making, targeted interventions, and resource optimization in endemic areas.
- Africa > Ethiopia (0.73)
- Africa > Nigeria (0.04)
- South America > Brazil (0.04)
- (3 more...)
Building low-resource African language corpora: A case study of Kidawida, Kalenjin and Dholuo
Mbogho, Audrey, Awuor, Quin, Kipkebut, Andrew, Wanzare, Lilian, Oloo, Vivian
Natural Language Processing is a crucial frontier in artificial intelligence, with broad applications in many areas, including public health, agriculture, education, and commerce. However, due to the lack of substantial linguistic resources, many African languages remain underrepresented in this digital transformation. This paper presents a case study on the development of linguistic corpora for three under-resourced Kenyan languages, Kidaw'ida, Kalenjin, and Dholuo, with the aim of advancing natural language processing and linguistic research in African communities. Our project, which lasted one year, employed a selective crowd-sourcing methodology to collect text and speech data from native speakers of these languages. Data collection involved (1) recording conversations and translation of the resulting text into Kiswahili, thereby creating parallel corpora, and (2) reading and recording written texts to generate speech corpora. We made these resources freely accessible via open-research platforms, namely Zenodo for the parallel text corpora and Mozilla Common Voice for the speech datasets, thus facilitating ongoing contributions and access for developers to train models and develop Natural Language Processing applications. The project demonstrates how grassroots efforts in corpus building can support the inclusion of African languages in artificial intelligence innovations. In addition to filling resource gaps, these corpora are vital in promoting linguistic diversity and empowering local communities by enabling Natural Language Processing applications tailored to their needs. As African countries like Kenya increasingly embrace digital transformation, developing indigenous language resources becomes essential for inclusive growth. We encourage continued collaboration from native speakers and developers to expand and utilize these corpora.
- Africa > South Sudan (0.14)
- Africa > Uganda (0.05)
- North America > United States (0.04)
- (17 more...)
- Health & Medicine (0.67)
- Media > News (0.46)
Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election
Mondini, Roberto, Kotonya, Neema, Logan, Robert L. IV, Olson, Elizabeth M, Lungati, Angela Oduor, Odongo, Daniel Duke, Ombasa, Tim, Lamba, Hemank, Cahill, Aoife, Tetreault, Joel R., Jaimes, Alejandro
Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Africa > Kenya > Bomet County > Bomet (0.05)
- (34 more...)
Bayesian Counterfactual Prediction Models for HIV Care Retention with Incomplete Outcome and Covariate Information
Oganisian, Arman, Hogan, Joseph, Sang, Edwin, DeLong, Allison, Mosong, Ben, Fraser, Hamish, Mwangi, Ann
Like many chronic diseases, human immunodeficiency virus (HIV) is managed over time at regular clinic visits. At each visit, patient features are assessed, treatments are prescribed, and a subsequent visit is scheduled. There is a need for data-driven methods for both predicting retention and recommending scheduling decisions that optimize retention. Prediction models can be useful for estimating retention rates across a range of scheduling options. However, training such models with electronic health records (EHR) involves several complexities. First, formal causal inference methods are needed to adjust for observed confounding when estimating retention rates under counterfactual scheduling decisions. Second, competing events such as death preclude retention, while censoring events render retention missing. Third, inconsistent monitoring of features such as viral load and CD4 count lead to covariate missingness. This paper presents an all-in-one approach for both predicting HIV retention and optimizing scheduling while accounting for these complexities. We formulate and identify causal retention estimands in terms of potential return-time under a hypothetical scheduling decision. Flexible Bayesian approaches are used to model the observed return-time distribution while accounting for competing and censoring events and form posterior point and uncertainty estimates for these estimands. We address the urgent need for data-driven decision support in HIV care by applying our method to EHR from the Academic Model Providing Access to Healthcare (AMPATH) - a consortium of clinics that treat HIV in Western Kenya.
- Africa > Kenya > Western Province (0.24)
- Africa > Kenya > Trans-Nzoia County > Kitale (0.04)
- Africa > South Africa (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Detection of Malaria Vector Breeding Habitats using Topographic Models
Treatment of stagnant water bodies that act as a breeding site for malarial vectors is a fundamental step in most malaria elimination campaigns. However, identification of such water bodies over large areas is expensive, labour-intensive and time-consuming and hence, challenging in countries with limited resources. Practical models that can efficiently locate water bodies can target the limited resources by greatly reducing the area that needs to be scanned by the field workers. To this end, we propose a practical topographic model based on easily available, global, high-resolution DEM data to predict locations of potential vector-breeding water sites. We surveyed the Obuasi region of Ghana to assess the impact of various topographic features on different types of water bodies and uncover the features that significantly influence the formation of aquatic habitats. We further evaluate the effectiveness of multiple models. Our best model significantly outperforms earlier attempts that employ topographic variables for detection of small water sites, even the ones that utilize additional satellite imagery data and demonstrates robustness across different settings.
- North America > United States (0.29)
- Africa > Ghana (0.25)
- Africa > Kenya > Western Province (0.05)
- (3 more...)
Development of Semantics-Based Distributed Middleware for Heterogeneous Data Integration and its Application for Drought
Drought is a complex environmental phenomenon that affects millions of people and communities all over the globe and is too elusive to be accurately predicted. This is mostly due to the scalability and variability of the web of environmental parameters that directly/indirectly causes the onset of different categories of drought. Since the dawn of man, efforts have been made to uniquely understand the natural indicators that provide signs of likely environmental events. These indicators/signs in the form of indigenous knowledge system have been used for generations. The intricate complexity of drought has, however, always been a major stumbling block for accurate drought prediction and forecasting systems. Recently, scientists in the field of agriculture and environmental monitoring have been discussing the integration of indigenous knowledge and scientific knowledge for a more accurate environmental forecasting system in order to incorporate diverse environmental information for a reliable drought forecast. Hence, in this research, the core objective is the development of a semantics-based data integration middleware that encompasses and integrates heterogeneous data models of local indigenous knowledge and sensor data towards an accurate drought forecasting system for the study areas. The local indigenous knowledge on drought gathered from the domain experts is transformed into rules to be used for performing deductive inference in conjunction with sensors data for determining the onset of drought through an automated inference generation module of the middleware. The semantic middleware incorporates, inter alia, a distributed architecture that consists of a streaming data processing engine based on Apache Kafka for real-time stream processing; a rule-based reasoning module; an ontology module for semantic representation of the knowledge bases.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.13)
- Africa > Sub-Saharan Africa (0.04)
- (50 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Personal (0.92)
- Health & Medicine (1.00)
- Government (1.00)
- Food & Agriculture > Agriculture (1.00)
- (3 more...)
Rural Kenyans power West's AI revolution. Now they want more
Naivasha, Kenya – Caroline Njau comes from a family of farmers who tend to fields of maize, wheat, and potatoes in the hilly terrain near Nyahururu, 180 kilometres (112 miles) north of the capital Nairobi. But Njau has chosen a different path in life. Seated in her living room with a cup of milk tea, she labels data for artificial intelligence (AI) companies abroad on an app. The sun rises over the unpaved streets of her neighbourhood as she flicks through images of tarmac roads, intersections and sidewalks on her smartphone while carefully drawing boxes around various objects; traffic lights, cars, pedestrians, and signposts. The designer of the app – an American subcontractor to Silicon Valley companies – pays her 3 an hour.
- Africa > Kenya > Nairobi City County > Nairobi (0.29)
- North America > United States > California (0.26)
- Africa > South Africa (0.06)
- (9 more...)
- Information Technology (0.71)
- Transportation > Ground > Road (0.70)
- Transportation > Infrastructure & Services (0.55)
Causal Machine Learning for Cost-Effective Allocation of Development Aid
Kuzmanovic, Milan, Frauen, Dennis, Hatt, Tobias, Feuerriegel, Stefan
The Sustainable Development Goals (SDGs) of the United Nations provide a blueprint of a better future by 'leaving no one behind', and, to achieve the SDGs by 2030, poor countries require immense volumes of development aid. In this paper, we develop a causal machine learning framework for predicting heterogeneous treatment effects of aid disbursements to inform effective aid allocation. Specifically, our framework comprises three components: (i) a balancing autoencoder that uses representation learning to embed high-dimensional country characteristics while addressing treatment selection bias; (ii) a counterfactual generator to compute counterfactual outcomes for varying aid volumes to address small sample-size settings; and (iii) an inference model that is used to predict heterogeneous treatment-response curves. We demonstrate the effectiveness of our framework using data with official development aid earmarked to end HIV/AIDS in 105 countries, amounting to more than USD 5.2 billion. For this, we first show that our framework successfully computes heterogeneous treatment-response curves using semi-synthetic data. Then, we demonstrate our framework using real-world HIV data. Our framework points to large opportunities for a more effective aid allocation, suggesting that the total number of new HIV infections could be reduced by up to 3.3% (~50,000 cases) compared to the current allocation practice.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Africa > Mozambique (0.04)
- Asia > India (0.04)
- (101 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)
Kencorpus: A Kenyan Language Corpus of Swahili, Dholuo and Luhya for Natural Language Processing Tasks
Wanjawa, Barack, Wanzare, Lilian, Indede, Florence, McOnyango, Owen, Ombui, Edward, Muchemi, Lawrence
Indigenous African languages are categorized as under-served in Natural Language Processing. They therefore experience poor digital inclusivity and information access. The processing challenge with such languages has been how to use machine learning and deep learning models without the requisite data. The Kencorpus project intends to bridge this gap by collecting and storing text and speech data that is good enough for data-driven solutions in applications such as machine translation, question answering and transcription in multilingual communities. The Kencorpus dataset is a text and speech corpus for three languages predominantly spoken in Kenya: Swahili, Dholuo and Luhya. Data collection was done by researchers from communities, schools, media, and publishers. The Kencorpus' dataset has a collection of 5,594 items - 4,442 texts (5.6M words) and 1,152 speech files (177hrs). Based on this data, Part of Speech tagging sets for Dholuo and Luhya (50,000 and 93,000 words respectively) were developed. We developed 7,537 Question-Answer pairs for Swahili and created a text translation set of 13,400 sentences from Dholuo and Luhya into Swahili. The datasets are useful for downstream machine learning tasks such as model training and translation. We also developed two proof of concept systems: for Kiswahili speech-to-text and machine learning system for Question Answering task, with results of 18.87% word error rate and 80% Exact Match (EM) respectively. These initial results give great promise to the usability of Kencorpus to the machine learning community. Kencorpus is one of few public domain corpora for these three low resource languages and forms a basis of learning and sharing experiences for similar works especially for low resource languages.
- Africa > East Africa (0.14)
- Africa > Kenya > Nairobi City County > Nairobi (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- (20 more...)
- Education (1.00)
- Media > News (0.93)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
Adaptive Interventions for Global Health: A Case Study of Malaria
Periáñez, África, Trister, Andrew, Nekkar, Madhav, del Río, Ana Fernández, Alonso, Pedro L.
Malaria can be prevented, diagnosed, and treated; however, every year, there are more than 200 million cases and 200.000 preventable deaths. Malaria remains a pressing public health concern in low- and middle-income countries, especially in sub-Saharan Africa. We describe how by means of mobile health applications, machine-learning-based adaptive interventions can strengthen malaria surveillance and treatment adherence, increase testing, measure provider skills and quality of care, improve public health by supporting front-line workers and patients (e.g., by capacity building and encouraging behavioral changes, like using bed nets), reduce test stockouts in pharmacies and clinics and informing public health for policy intervention.
- Africa > Sub-Saharan Africa (0.24)
- Africa > Malawi (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (20 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)